fix: enforce WAL writer fencing via conditional PUT#6432
fix: enforce WAL writer fencing via conditional PUT#6432hamersaw wants to merge 1 commit intolance-format:mainfrom
Conversation
WAL fencing was broken — a fenced writer could keep writing WAL entries indefinitely because flush_to_with_index_update used plain put() with no fencing check. Two writers with the same shard ID would silently overwrite each other's WAL entries since both started from the same manifest position. Replace with conditional PUT fencing: - WAL entries now use PutMode::Create (put-if-not-exists), which costs the same as regular PUT on S3 (If-None-Match header) but fails atomically if another writer already claimed that position - New write_fence_barrier() at startup writes an empty WAL entry to claim the next position, scanning forward past any stale entries from old writers - Removes manifest read from WAL flush hot path (was 4-5 round trips per flush) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Verification that fencing was broken on
|
westonpace
left a comment
There was a problem hiding this comment.
Is a "put not exists" actually free? I thought object stores typically had a much lower throughput for conditional operations (e.g. 10 per second) than unconditional operations.
| // Build an empty WAL entry (schema + epoch metadata, no batches) | ||
| let mut metadata = schema.metadata().clone(); | ||
| metadata.insert(WRITER_EPOCH_KEY.to_string(), self.writer_epoch.to_string()); | ||
| let schema_with_epoch = ArrowSchema::new_with_metadata(schema.fields().to_vec(), metadata); |
There was a problem hiding this comment.
Have we confirmed these fence barriers don't bother the read half?
| "Writer fenced: WAL position {} already exists for shard {} \ | ||
| (another writer has claimed this position)", |
There was a problem hiding this comment.
What does the fenced writer do if it's not automatically retrying?
| // Write fence barrier to claim WAL positions and fence zombie writers. | ||
| // Uses PutMode::Create to atomically claim the next WAL position, | ||
| // scanning forward past any entries left by old (fenced) writers. | ||
| wal_flusher.write_fence_barrier(&schema).await?; |
There was a problem hiding this comment.
It's slightly counter intuitive that a method named open will actually do a write (e.g. isn't idempotent). Can we document this in the method docs?
Summary
flush_to_with_index_updateused plainput()with no fencing check. Two writers with the same shard ID silently overwrote each other's WAL entries.PutMode::Create(put-if-not-exists) — same cost as regular PUT on S3 but fails atomically if another writer already claimed that position.write_fence_barrier()at startup writes an empty WAL entry to claim the next position, scanning forward past any stale entries from old writers. This immediately fences zombie writers on open.Test plan
wal::tests(conditional PUT rejection, fence barrier claiming, barrier skip-forward, barrier blocking old writer, sequential position uniqueness)manifest::tests(epoch claiming, check_fenced edge cases, commit_update rejection, three-writer scenario)write::testspass (including E2E correctness test with fence barrier in startup path)🤖 Generated with Claude Code